Linguistically engineered tools for speech recognition error analysis

نویسندگان

  • Carol Van Ess-Dykema
  • Klaus Ries
چکیده

In order to improve Large Vocabulary Continuous Speech Recognition (LVCSR) systems, it is essential to discover exactly how our current systems are underperforming. The major intellectual tool for solving this problem is error analysis: careful investigation of just which factors are contributing to errors in the recognizers. This paper presents our observations of the effects that discourse (i.e., dialog) modeling has on LVCSR system performance. As our title indicates, we emphasize the recognition error analysis methodology we developed and what it showed us as opposed to emphasizing development of the discourse model itself. In the first analysis of our output data, we focussed on errors that could be eliminated by Dialog Act discourse tagging [JSB97] using Dialog Act-specific language models. In a second analysis, we manipulated the parameterization of the Dialog Act-specific language models, enabling us to acquire evidence of the constraints these models introduced. The word error rate did not significantly decrease since the error rate in the largest category of Dialog Acts, namely Statements, did not significantly decrease. We did, however, observe significant error reduction in the less frequently occurring Dialog Acts and we report on the characteristic of the error corrections. We discovered that discourse models can introduce simple syntactic constraints and that they are most sensitive to parts of speech. To appear in: International Conference on Spoken Language Processing (ICSLP’98), Sidney, Australia, 30th November-4th December 1998

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Incorporating voice onset time to improve letter recognition accuracies

We consider the possibility of incorporating distinctive features into a statistically based speech recognizer. We develop a two pass strategy for recognition with a standard HMM based first pass followed by a second pass that performs an alternative analysis to extract class-specific features. For the voiced/voiceless distinction on stops for an alphabet recognition task, we show that a lingui...

متن کامل

Subword-based Automatic Lexicon Learning for ASR

We present a framework for learning a pronunciation lexicon for an Automatic Speech Recognition (ASR) system from multiple utterances of the same training words, where the lexical identities of the words are unknown. Instead of only trying to learn pronunciations for known words we go one step further and try to learn both spelling and pronunciation in a joint optimization. Decoding based on li...

متن کامل

Improving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules

In this paper, we show that linguistically motivated pronunciation rules can improve phone and word recognition results for Modern Standard Arabic (MSA). Using these rules and the MADA morphological analysis and disambiguation tool, multiple pronunciations per word are automatically generated to build two pronunciation dictionaries; one for training and another for decoding. We demonstrate that...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998